Goto

Collaborating Authors

 claim frequency


Explainable Boosting Machine for Predicting Claim Severity and Frequency in Car Insurance

Krùpovà, Markéta, Rachdi, Nabil, Guibert, Quentin

arXiv.org Machine Learning

In a context of constant increase in competition and heightened regulatory pressure, accuracy, actuarial precision, as well as transparency and understanding of the tariff, are key issues in non-life insurance. Traditionally used generalized linear models (GLM) result in a multiplicative tariff that favors interpretability. With the rapid development of machine learning and deep learning techniques, actuaries and the rest of the insurance industry have adopted these techniques widely. However, there is a need to associate them with interpretability techniques. In this paper, our study focuses on introducing an Explainable Boosting Machine (EBM) model that combines intrinsically interpretable characteristics and high prediction performance. This approach is described as a glass-box model and relies on the use of a Generalized Additive Model (GAM) and a cyclic gradient boosting algorithm. It accounts for univariate and pairwise interaction effects between features and provides naturally explanations on them. We implement this approach on car insurance frequency and severity data and extensively compare the performance of this approach with classical competitors: a GLM, a GAM, a CART model and an Extreme Gradient Boosting (XGB) algorithm. Finally, we examine the interpretability of these models to capture the main determinants of claim costs.


From Point to probabilistic gradient boosting for claim frequency and severity prediction

Chevalier, Dominik, Côté, Marie-Pier

arXiv.org Machine Learning

Gradient boosting for decision tree algorithms are increasingly used in actuarial applications as they show superior predictive performance over traditional generalized linear models. Many improvements and sophistications to the first gradient boosting machine algorithm exist. We present in a unified notation, and contrast, all the existing point and probabilistic gradient boosting for decision tree algorithms: GBM, XGBoost, DART, LightGBM, CatBoost, EGBM, PGBM, XGBoostLSS, cyclic GBM, and NGBoost. In this comprehensive numerical study, we compare their performance on five publicly available datasets for claim frequency and severity, of various size and comprising different number of (high cardinality) categorical variables. We explain how varying exposure-to-risk can be handled with boosting in frequency models. We compare the algorithms on the basis of computational efficiency, predictive performance, and model adequacy. LightGBM and XGBoostLSS win in terms of computational efficiency. The fully interpretable EGBM achieves competitive predictive performance compared to the black box algorithms considered. We find that there is no trade-off between model adequacy and predictive accuracy: both are achievable simultaneously.


Bayesian CART models for insurance claims frequency

Zhang, Yaojun, Ji, Lanpeng, Aivaliotis, Georgios, Taylor, Charles

arXiv.org Machine Learning

Accuracy and interpretability of a (non-life) insurance pricing model are essential qualities to ensure fair and transparent premiums for policy-holders, that reflect their risk. In recent years, the classification and regression trees (CARTs) and their ensembles have gained popularity in the actuarial literature, since they offer good prediction performance and are relatively easily interpretable. In this paper, we introduce Bayesian CART models for insurance pricing, with a particular focus on claims frequency modelling. Additionally to the common Poisson and negative binomial (NB) distributions used for claims frequency, we implement Bayesian CART for the zero-inflated Poisson (ZIP) distribution to address the difficulty arising from the imbalanced insurance claims data. To this end, we introduce a general MCMC algorithm using data augmentation methods for posterior tree exploration. We also introduce the deviance information criterion (DIC) for the tree model selection. The proposed models are able to identify trees which can better classify the policy-holders into risk groups. Some simulations and real insurance data will be discussed to illustrate the applicability of these models.


Neural networks for insurance pricing with frequency and severity data: a benchmark study from data preprocessing to technical tariff

Holvoet, Freek, Antonio, Katrien, Henckaerts, Roel

arXiv.org Artificial Intelligence

Insurers usually turn to generalized linear models for modelling claim frequency and severity data. Due to their success in other fields, machine learning techniques are gaining popularity within the actuarial toolbox. Our paper contributes to the literature on frequency-severity insurance pricing with machine learning via deep learning structures. We present a benchmark study on four insurance data sets with frequency and severity targets in the presence of multiple types of input features. We compare in detail the performance of: a generalized linear model on binned input data, a gradient-boosted tree model, a feed-forward neural network (FFNN), and the combined actuarial neural network (CANN). Our CANNs combine a baseline prediction established with a GLM and GBM, respectively, with a neural network correction. We explain the data preprocessing steps with specific focus on the multiple types of input features typically present in tabular insurance data sets, such as postal codes, numeric and categorical covariates. Autoencoders are used to embed the categorical variables into the neural network and we explore their potential advantages in a frequency-severity setting. Finally, we construct global surrogate models for the neural nets' frequency and severity models. These surrogates enable the translation of the essential insights captured by the FFNNs or CANNs to GLMs. As such, a technical tariff table results that can easily be deployed in practice.


Enhanced Gradient Boosting for Zero-Inflated Insurance Claims and Comparative Analysis of CatBoost, XGBoost, and LightGBM

So, Banghee

arXiv.org Artificial Intelligence

The property and casualty (P&C) insurance industry faces challenges in developing claim predictive models due to the highly right-skewed distribution of positive claims with excess zeros. To address this, actuarial science researchers have employed "zero-inflated" models that combine a traditional count model and a binary model. This paper investigates the use of boosting algorithms to process insurance claim data, including zero-inflated telematics data, to construct claim frequency models. Three popular gradient boosting libraries - XGBoost, LightGBM, and CatBoost - are evaluated and compared to determine the most suitable library for training insurance claim data and fitting actuarial frequency models. Through a comprehensive analysis of two distinct datasets, it is determined that CatBoost is the best for developing auto claim frequency models based on predictive performance. Furthermore, we propose a new zero-inflated Poisson boosted tree model, with variation in the assumption about the relationship between inflation probability $p$ and distribution mean $\mu$, and find that it outperforms others depending on data characteristics. This model enables us to take advantage of particular CatBoost tools, which makes it easier and more convenient to investigate the effects and interactions of various risk features on the frequency model when using telematics data.


Comparative Safety Performance of Autonomous- and Human Drivers: A Real-World Case Study of the Waymo One Service

Di Lillo, Luigi, Gode, Tilia, Zhou, Xilin, Atzei, Margherita, Chen, Ruoshu, Victor, Trent

arXiv.org Artificial Intelligence

This study compares the safety of autonomous- and human drivers. It finds that the Waymo One autonomous service is significantly safer towards other road users than human drivers are, as measured via collision causation. The result is determined by comparing Waymo's third party liability insurance claims data with mileage- and zip-code-calibrated Swiss Re (human driver) private passenger vehicle baselines. A liability claim is a request for compensation when someone is responsible for damage to property or injury to another person, typically following a collision. Liability claims reporting and their development is designed using insurance industry best practices to assess crash causation contribution and predict future crash contributions. In over 3.8 million miles driven without a human being behind the steering wheel in rider-only (RO) mode, the Waymo Driver incurred zero bodily injury claims in comparison with the human driver baseline of 1.11 claims per million miles (cpmm). The Waymo Driver also significantly reduced property damage claims to 0.78 cpmm in comparison with the human driver baseline of 3.26 cpmm. Similarly, in a more statistically robust dataset of over 35 million miles during autonomous testing operations (TO), the Waymo Driver, together with a human autonomous specialist behind the steering wheel monitoring the automation, also significantly reduced both bodily injury and property damage cpmm compared to the human driver baselines.


Frequency-Severity Experience Rating based on Latent Markovian Risk Profiles

Verschuren, Robert Matthijs

arXiv.org Artificial Intelligence

Bonus-Malus Systems (BMSs) are nowadays widely employed in automobile insurance to dynamically adjust a premium based on a customer's claims experience. The intuition behind these posterior ratemaking systems is that as we observe more claiming behavior, we learn more about the underlying risk profile. These systems are therefore a commercially attractive form of experience rating, in which we correct the prior premium for past claims to reflect our updated beliefs about a customer's risk profile. Moreover, they traditionally consider a customer's number of claims irrespective of their sizes and thus implicitly assume independence between the claim counts and sizes (Hey, 1970; Denuit et al., 2007; Boucher and Inoussa, 2014; Verschuren, 2021). Alternative Bayesian forms of experience rating typically depend only on the frequency component as well or consider the two components separately (see, e.g., Denuit and Lang (2004); Bühlmann and Gisler (2005); Mahmoudvand and Hassani (2009); Bermúdez and Karlis (2011, 2017)).


Swiss Re leveraging machine learning to predict motor frequency developments - Reinsurance News

#artificialintelligence

By utilising machine learning and numerical text processing techniques, Swiss Re has been able to generate a "predictive view" of motor frequency developments in several markets. In a recent conversation with Nikita Kuksin, Hhead of modelling within Casualty R&D, Miriam Hook, vice president Global clients and Surbhi Gupta, assistant vice president, casualty R&D at Swiss Re, it was explained to us how these alternative approaches were able to provide added granularity to existing data. "We intended to develop an alternative to traditional actuarial calculation methods that would give us an "external perspective" on claims frequency within our motor portfolio and allow us to predict motor frequency developments in several motor markets," said Kuksin, who leads the modelling team within the casualty research and development department at the Swiss Re Institute. Gupta, who prior to her current role served at Swiss Re for three years' as a data scientist, explained how these methods were brought into fruition by first checking the status quo of frequency developments against external data, before then explaining motor frequency using external data to generate factors that could be projected into the future. "These are complex objectives, requiring solid data sets and robust analytics," Gupta explained.